Cream of the Crop 22

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 22 / Cream of the Crop 22.iso / program / ctlib100.zip / INSTALL.LZH / BTREES.TXT < prev next >

Wrap

Text File | 1996-10-12 | 12KB | 191 lines

CHAPTER 12 B Trees This chapter discusses in detail the B-tree objects included in the Containers Library. It describes how to use these objects and the different types of B-trees available. In this chapter you will learn: - What are B and B+ Trees? - How to use the B and B+ Trees - The characteristics of each type of B Tree General overview B-trees are ideal for maintaining very large indexes that cannot be stored completely in memory. They have a wide variety of applications, particularly in situations where access to very large files in sequential as well as key order is required. All B-trees are paginated structures, meaning that they read complete blocks (pages) of data into the memory at once. Therefore, with very few accesses to the file, you can obtain a large amount of records from the data set. Also, B-trees are always kept almost completely balanced, so there is no risk for potentially large search trajectories. A B-tree stores the key and its respective data together in the node. Key/data couples are distributed in all nodes in the tree, so the information can be found in any level of the B tree. This differs from B+ trees in which all searches are required to reach the last level of the tree. In B+ trees, all key and data information is stored in a linked sequence of blocks, located in the last level of the tree. Indexed access to this set of sequences is provided by an index structure that consists of copies of the actual keys, which represent the limits of blocks in the set of sequences. All B-trees in the Containers Library provide dynamic page buffering for very fast access to the data, and perform complete balancing (redistribution) of pages during insertions and deletions, optimizing the use of space and maximizing performance. What are B Trees? A search tree of m-ways, is a tree in which every node has an external degree (the number of pointers to other nodes) less than or equal to m. A non-empty search tree of m-ways has the following characteristics: - Each node in the tree has the structure: n ; P0, K0, P1, K1, ..., Pm-1, Km-1, Pm where P0, P1, ..., Pm are pointers to the subtrees of the node and K0, K1, ..., Km-1 are the keys (or data items). - All key values of nodes in the subtree pointed by Pi are smaller than the value of Ki, for i=0, 1, ..., m-1. - All key values of nodes in the subtree pointed by Pm are greater than the value of Km-1. - Subtrees pointed by Pi, i=0, 1, ..., m are also search trees of m-ways. The maximum length of a search is given by the height of the tree. In general, the greater the ramification of the tree (m), the smaller the maximum length of a search will be. An optimum search tree of m-ways and n-keys has a maximum length for a search of logm(n+1). A B-tree of degree m, is a search tree of m-ways with the following characteristics: - Every node of the tree, except the root and the leaves, has at least (m/2) subtrees and no more than m subtrees. - The root of the tree has at least two subtrees unless the root itself is a leaf. - All the leaves of the tree are at the same level. What are B+ Trees? A B+ tree is a data structure similar to a B-tree, which provides sequential indexed access to the data. In a B+ tree, all the keys and the respective records are stored in a linked set of data blocks know as a set of sequences. The indexed access to this set of sequences is provided by a conceptually separated structure know as the index set. In a B+ tree, the index set consists in a B-tree that stores copies of keys that represent the limits between data blocks in the set of sequences. These copies of the actual keys are called separators since they separate one block in the set of sequences, from its predecesor. There are two significative advantages of using a B+ tree over a B tree: - The set of sequences can be processed in a truly sequential way, providing efficient access to records in key order. - Using of separators instead of full records in the index set, often means that the number of separators that can be stored in one node of the index set of a B+ tree, substantially exceeds the number of records that can be stored in one node of a B tree of the same size. The separators (which are copies of the actual keys) are smaller than the key/data couples stored in a B tree, so you can store more keys in a node of a B+ tree. Since the number of descendant nodes is larger, a B+ tree frequently will produce a tree with a smaller height than the one a B tree would produce. In practice, the last of these two advantages is the most important. The importance of the first one is usually diminished by the fact that you can obtain acceptable performance when sequentially traversing a B tree by using buffers. B-trees are more attractive when the key itself makes up most of each record stored in the tree. However, when the key is only a small part of the record, it is possible to build a wider, lower tree using a B+ tree. Using the B Trees - B trees are completely stream based containers, meaning that they can only exist in a stream. Usually, the stream that you use will be a TDosStream or TBufStream, since B trees are mostly used to store large sets of data that you need to have access to in the future. You can use B trees to store non-dynamically allocated non-object data (e.g. records) or dynamically allocated objects; all you have to do is choose the appropriate object for the data you wish to store. If you want to store records, for instance, you can use the TBTree and TBPlusTree containers, while if what you want to store are objects, you would use instead the TObjectBTree and TObjectBPlusTree containers. Standard B Trees Before using any B tree, you need to override the tree's KeyOf method, so that it knows how you want to sort the data in the tree. As our data type we will use the TContact record structure used in previous examples. type PContactList = ^TContactList; TContactList = object(TBTree) function KeyOf(Item: Pointer): Pointer; virtual; end; function TContactList.KeyOf(Item: Pointer): Pointer; begin KeyOf := @PContact(Item)^.LastName; end; To begin using a B tree, the first thing you need to do is to create the stream where you will store the data: Stream := New(PBufStream, Init('btrees.dat', stCreate, 1024)); Once the stream has been opened, you can proceed to initialize the B tree. To do this, you will need to provide the following information to the B tree's Init method: - Degree of the tree: number of items plus one, that will be stored in one "page" or node of the tree. - Data size: size of the data as stored in the stream. - Size of the buffer: number of "pages" or nodes that will be kept in memory. The minimum valid value is 5. If you are using a B+ Tree instead of a normal B Tree, you must also provide the following: - Size (in number of items) of each data block - Size of the data blocks buffer; the minimum valid value is 2. Here is how you would initialize a B tree of degree 3 and a buffer of 5 (the minimum): ContactList := New(PContactList, Init(3, SizeOf(TContact), Stream, 5)); A B+ tree of degree 2, with a block size of 3, a node buffer of 5 and a data block buffer of 2, would be initialized using the following statement: ContactList := New(PContactList, Init(2, 3, SizeOf(TContact), 20, Stream, 5, 2)); Inserting data into a B tree is just like inserting data into any other container: simply call Insert with a pointer to the data that you wish to store in the tree: with ContactList^ do begin SetContactValues('Lewis', 'Carl', '(506) 83-780', 'Running, Corp.', Contact); Insert(@Contact); SetContactValues('Benton', 'Michael', '(403) 33-973', 'ER, Inc.', Contact); Insert(@Contact); SetContactValues('Wagner', 'Robert', '(906) 11-230', 'Symphony, Ltd.', Contact); Insert(@Contact); SetContactValues('Smith', 'John', '(656) 75-843', 'InterComm, Corp.', Contact); Insert(@Contact); end; Although using a B tree is just like using any other container, there are several important considerations about B trees that you must consider: - They swap items from memory to the stream as needed. That means that you can be sure only that the item you just retrieved is going to be in memory. If you call another method in the B tree, that item might no longer be in memory and the pointer returned by the tree will not be valid. If you need to keep the previous item in memory, you will need to call the CopyItem method to create a copy of the item. Note that if you are storing objects and the objects contain pointers to other data in the heap, you will need to override this method to copy this data as well. - Every time you modify the data of an item in the tree, you must call the tree's Update method. This will force the tree to store the data in the stream before disposing of it. If you don't call Update, the data in the stream will not be updated with the new changes. Although B trees are stream based containers, they do not require the use of DoneItem. This is because the trees swap items in and out of memory as needed, disposing of items for you. To dispose of the B tree, simply call its Done destructor. The stream used by the tree, however, will not be disposed of. Therefore, unless you want to continue using the stream in your application, you must also dispose of the stream: Dispose(ContactList, Done); Dispose(Stream, Done); Object B trees When using B trees to store objects, you must remember to register the object for stream access. Otherwise, you will get a "Put of unregistered object" error when the tree tries to swap a page to the stream. Also, if your object contains pointers to other data, you must be careful when determining the size of the data as stored in the stream. Suppose you want to store the following object in a B tree: type PContact = ^TContact; TContact = object(TObject) FirstName, LastName, Phone, Company : PString; constructor Init(ALastName: String20; AFirstName: String15; APhone: String18; ACompany : String25); constructor Load(var S: TStream); destructor Done; virtual; procedure Store(var S: TStream); end; Here, you can't use SizeOf(TContact) as the size of the data, since this would only return the size of the object in memory (2 + 4 * SizeOf(Pointer)). Instead, you must determine the maximum size of the item when stored in a stream, and pass that size as a parameter to the B tree's Init method. In our case, we will assume that the strings in the object have the following maximum sizes: FirstName 15 LastName 20 Phone 18 Company 25 Therefore, the size of the object (as stored in a stream) would be: 16 + 21 + 19 + 26 = 82 Opening an existing tree Once you have created a tree in a file on disk, you must not call the tree's Init method to reopen it, since this would create a completely new tree, overwriting the old one. To open a tree that has already been created, you must use the Open constructor. Open will take as parameters the stream where the tree is stored and the size of the buffer or buffers that will be used: { Open the stream } Stream := New(PBufStream, Init('btrees.dat', stOpen, 1024)); { Open a B tree } ContactList := New(PContactList, Open(Stream, 5)); { Open a B+ tree } ContactList := New(PContactList, Open(Stream, 5, 2)); Notice how we open the stream in read and write mode. You must not use the stCreate file access mode, or you will overwrite the existing file with a new empty one.